Neural network enabled nanoplasmonic hydrogen sensors with 100 ppm limit of detection in humid air

Environmental humidity variations are ubiquitous and high humidity characterizes fuel cell and electrolyzer operation conditions. Since hydrogen-air mixtures are highly flammable, humidity tolerant H2 sensors are important from safety and process monitoring perspectives. Here, we report an optical nanoplasmonic hydrogen sensor operated at elevated temperature that combined with Deep Dense Neural Network or Transformer data treatment involving the entire spectral response of the sensor enables a 100 ppm H2 limit of detection in synthetic air at 80% relative humidity. This significantly exceeds the <1000 ppm US Department of Energy performance target. Furthermore, the sensors pass the ISO 26142:2010 stability requirement in 80% relative humidity in air down to 0.06% H2 and show no signs of performance loss after 140 h continuous operation. Our results thus demonstrate the potential of plasmonic hydrogen sensors for use in high humidity and how neural-network-based data treatment can significantly boost their performance.

-All measurements were adjusted only for changes in simple temperature and humidity.Please briefly menfion whether the adsorpfion of various toxic gases, deformafion of microroughness on surface, and sensor characterisfics can be preserved -The Limit of Detecfion (LoD) was determined through signal interpolafion at three fimes the noise level.The noise level for each hydrogen concentrafion differs; kindly specify the hydrogen concentrafion at which the noise level was calculated.
-Please elaborate on the comparafive advantages of this algorithmic technique over exisfing algorithms.
-Under the same measurement condifions, please provide and idenfify the primary cause of baseline drift.

Reviewer #2 (Remarks to the Author):
Tomeck et al. reports a deep dense neural network (DDNN) for high performing hydrogen sensors under humid air condifion that qualifies requirements, which is one of the important issues in sensor communifies.Through tailored analysis on various condifions, it is described that the sensifivity and stability are enhanced, along with clear sensing dynamics under humid condifion.Moreover, the authors demonstrate key parameters for hydrogen sensor based on US department of Energy and detailed mechanism behind sensing dynamic behavior and good graphic elaborafion.However, substanfial modificafion is needed in the current version, including DDNN and other sensing performances such as the response speed and selecfivity, before it is accepted in Nature Comm., as below.
2. The authors used the Pd/Au alloy disk with the composifion of 70:30 and 197 nm diameter, 25 nm height structure for hydrogen sensor material.Why did the authors use the composifion and structure of the sensor material for LSPR hydrogen sensing?Please provide the selecfion of the materials in this study.In addifion, please describe the role of Au in Pd alloy in LSPR sensing.3.As the authors menfioned, the humidity induces the deacfivafion effect on Pd, which results in base line drifting, signal instability, sensifivity drop, and decelerated response.However, Pd is also prone to oxidafion in humid, high temperature condifion, which can also cause aforemenfioned problems.I guess that the oxidafion can be the problem for Pd/Au disk to detect hydrogen.If so, the chemical analysis before and after the humidity tesfing is needed.4. Importantly, there are many criteria for the hydrogen sensor according to the US DoE.Among them, the response fime for hydrogen sensor requires to be under few seconds since hydrogen is a very light gas that can be explosive around 4 % concentrafion that can leads to catastrophic disaster.However, in the methods secfion, DDNN requires the whole fime series of opfical spectra in the range of 400 -1000 nm as input (pg 26, line 557).Therefore, pracficality in terms of response fime of DDNN for hydrogen sensor under humid condifion must be discussed.5.In addifion to sensifivity, selecfivity is also very important criteria for hydrogen gas sensor.Can DDNN be applied to various sensing environments other than humidity condifion?For example, hydrogen car or stafion would require detecfion of hydrogen in the mixture gases that include volafile organic compounds and other gases such as CO2, CH4 and many more.

Reviewer #3 (Remarks to the Author):
• This paper reported an opfical nanoplasmonic hydrogen sensor operated at an elevated temperature.A deep dense neural network trained by the spectral response of the sensor enables to decrease the limit of detecfion at 80 % relafive humidity.The authors proposed an interesfing method to solve the humidity interference and improve the limit of detecfion of the sensor.There are sfill many issues to be addressed in the manuscript.
• 1.The elevated temperature eliminates the humidity interference on the sensors is well known, but if the performance of the sensor can remain stable under high temperature and high humidity for days and months.2. It is not accurate to deduce the LOD by calculafing the signal at 3 fimes the noise level, the signal varies significantly at small concentrafions.It is befter to experimental measure the sensor at 200 ppm H2.Moreover, the noise level of the sensor should be presented and evaluated for different condifions, such as temperature and humidity.3. The resolufion of figures in the manuscript should be improved.
• 4. Using a neural network to improve the weak signal of the sensor requires extremely stable and repeatable sensors in different batches.If the sensors from different batches will obtain the same performance when applying the neural network.How many data sets are applied for training?• 5.The data applied to train the neural network is obtained via the sensor by fixed concentrafion and relafive humidity.If the data of the sensor for other combinafions of concentrafion and relafive humidity, such as 0.05% H2 concentrafion at 75% relafive humidity, could verify the robustness of the neural network.The data with high humidity and low concentrafion can validate extreme condifions.

Reviewer #1
This manuscript presents the impact of humidity on sensor performances, focusing on a sensor that u6lizes Pd70Au30 alloy nanopar6cles as the sensing material.The study involves measuring the sensor's performance while varying both temperature and humidity levels.Using the collected data, an algorithm was developed to accurately evaluate the Limit of Detec6on (LoD) and sensi6vity for hydrogen under all atmospheric condi6ons.This evalua6on was based on analyzing the baseline, amplitude, and wavelength changes of the sensor.A major novelty of this research lies in the comprehensive examina6on of sensor characteris6cs under varying temperature and humidity condi6ons.Notably, the sensor's performances were found to remain consistent in diverse surrounding environments, thanks to the employment of a DDNN-based architecture.Follows are this reviewer's comments to improve the quality of the manuscript.
Comment 1: All measurements were adjusted only for changes in simple temperature and humidity.Please briefly men6on whether the adsorp6on of various toxic gases, deforma6on of microroughness on surface, and sensor characteris6cs can be preserved.
Our reply: The Reviewer is correct that we in this study have focused on temperature and humidity specifically, since the humidity aspect is a widely unresolved challenge.At the same >me, in our previous works we have widely inves>gated the compa>bility of our sensors with toxic gases, such as CO, NO2, as well as CO2 and CH4, and we have developed strategies that involve alloying with Cu in ternary PdAuCu nanopar>cles and the use of polymeric coa>ngs/nanocomposites (Op>miza>on of the Composi>on of PdAuCu Ternary Alloy Nanopar>cles for Plasmonic Hydrogen Sensing.ACS Applied Nanomaterials, 2021, 4, 9, 8716-8722. DOI:10.1021/acsanm.1c01242;Highly Permeable Fluorinated Polymer Nanocomposites for Plasmonic Hydrogen Sensing. ACS Applied Materials & Interfaces, 2021, 13, 18, 21724-21732. DOI:10.1021/acsami.1c01968;Bulk-Processed pd Nanocube-Poly(methylmethacrylate) Nanocomposites as Plasmonic Plas>cs for Hydrogen Sensing.ACS Applied Nanomaterials 2020, 3, 8, 8438-8445. DOI: 10.1021/acsanm.0c01907;Ra>onally Designed PdAuCu Ternary Alloy Nanopar>cles for Intrinsically Deac>va>on-Resistant Ultrafast Plasmonic Hydrogen Sensing. ACS Sensors 2019, 4, 1424-1432. DOI:10.1021/acssensors.9b00610;Metal-Polymer Hybrid Nanomaterials for Plasmonic Ultrafast Hydrogen Detec>on. Nature Materials 2019, 18, 489-495. DOI:10.1038/s41563-019-0325-4) to demonstrate that Pd-alloy based plasmonic H2 sensors can be operated in toxic gas condi>ons.This is clearly stated in the introduc>on sec>on where we write: "….high selec6vity, and deac6va6on resistance towards O2, CO and NO2, have been demonstrated, the laSer using both suitable alloy composi6ons and protec6ve polymer coa6ngs. 3,13,14" Hence, we reason that again including a systema>c evalua>on of sensor opera>on in toxic gases is somewhat redundant since we already mul>ple >mes have demonstrated that our sensors can be operated in these condi>ons.Furthermore, since deac>va>on by species such as CO is the consequence of the strong binding to Pd. Accordingly, just like with H2O in focus here, sensor opera>on at eleva>on will mi>gate CO poisoning and is thus beneficial in the same way.
With respect to "deforma>on of microroughness on the surface", we are not quite sure what exactly the Reviewer is referring to but since the PdAu nanopar>cles used here are fabricated by applying a high temperature annealing step, they do not exhibit significant surface roughness but are comprised of a small number of crystallites with low index facets (see Figure 1 b and also Figure S 17 in the SI).These structures are stable over very long >me, as explicitly demonstrated in this work in the long-term stability measurement showcased in Figure 6.And maybe even more clearly demonstrated by the fact that the sensor used in this work has been exposed to high humidity condi>ons in the context of gen-era>ng the data for this work for 844 hours in total over a period of more than 1.5 years.
To address this point in the revised manuscript, we have added the following text in the revised manuscript on page 20: "As the last aspect, we note that the sensor used throughout this work has spent a total of 844 hours on stream in high humidity experiments during a period of more than 1.5 years, where it was intermittently stored at ambient condi6ons.Yet, its response is unchanged, and performance prevails, which corroborates both its structural and surface chemical integrity over 6me." Comment 2: The Limit of Detec6on (LoD) was determined through signal interpola6on at three 6mes the noise level.The noise level for each hydrogen concentra6on differs; kindly specify the hydrogen concentra6on at which the noise level was calculated.
Our reply: What the Reviewer is asking for here, is precisely the informa>on included in Fig. S10, which depicts noise level, expressed as standard devia>on in the Δλpeak signal, ploged as a func>on of hydrogen concentra>on, rela>ve humidity, and sensor opera>ng temperature.These data show that the noise is independent from all these factors, which corroborates that what defines the noise level is op>cal noise, that is the intrinsic readout noise of the used spectrometer and fluctua>ons in the light source intensity.
Comment 3: Please elaborate on the compara6ve advantages of this algorithmic technique over exis6ng algorithms.
Our reply: This is a very relevant request by the Reviewer, which we try to answer as follows.The DDNNbased algorithmic techniques employed in this study dis>nguishes from exis>ng algorithms by their superior ability to process complex, nonlinear rela>onships within large datasets, which is essen>al when dealing with mul>faceted sensor responses under varying environmental condi>ons.Unlike tradi>onal algorithms that typically require manual feature selec>on or are limited to linear inputoutput mappings, the DDNN can automa>cally discern intricate pagerns in the data and thus allow for a more nuanced and accurate detec>on of hydrogen levels across a spectrum of humidity and temperature ranges.Its robustness to noise and capability to learn from ambiguous signals further enhance its u>lity, par>cularly in real-world applica>ons where sensor data may be imperfect or incomplete.With regards to its advantages over other deep learning-based techniques for nanoplasmonic hydrogen sensing, there are no such techniques in published literature to our knowledge, which makes direct comparison in this specific respect impossible.Nevertheless, to discuss this issue in the manuscript, we have added the following text to it revised version: " We chose this specific DDNN algorithmic technique because it offers compara6ve advantages over tradi6onal data analysis methods, most notably through its capacity to autonomously recognize complex paSerns within the sensor data, even under complex variable environmental influences.This approach allows for more nuanced sensor performance characteriza6on, par6cularly in challenging con-di6ons that involve non-linear interac6ons between mul6ple variables, as in the present case.Furthermore, its robustness to noise and ambiguous signals renders it especially suitable for real-world applica6ons, such as H2 sensors." Comment 4: Under the same measurement condi6ons, please provide and iden6fy the primary cause of baseline dria.
Our reply: We have added the following sentence to the revised manuscript: "As the main cause for this dria, we iden6fy long term intensity varia6ons of the used halogen light source."

Reviewer #2:
Tomeck et al. reports a deep dense neural network (DDNN) for high performing hydrogen sensors under humid air condi6on that qualifies requirements, which is one of the important issues in sensor communi6es.Through tailored analysis on various condi6ons, it is described that the sensi6vity and stability are enhanced, along with clear sensing dynamics under humid condi6on.Moreover, the authors demonstrate key parameters for hydrogen sensor based on US department of Energy and detailed mechanism behind sensing dynamic behavior and good graphic elabora6on.However, substan6al modifica6on is needed in the current version, including DDNN and other sensing performances such as the response speed and selec6vity, before it is accepted in Nature Comm., as below.
Comment 1: The authors have to provide the reasons and advantages behind the DDNN approach for performance enhancement compared to other deep learning methods in the main text.
Our reply: This is indeed a relevant request and very similar to Comment #3 by Reviewer #1.We therefore reproduce here our response given above to Comment #3 by Reviewer #1: The DDNN-based algorithmic techniques employed in this study dis>nguishes from exis>ng algorithms by their superior ability to process complex, nonlinear rela>onships within large datasets, which is essen>al when dealing with mul>faceted sensor responses under varying environmental condi>ons.Unlike tradi>onal algorithms that typically require manual feature selec>on or are limited to linear input-output mappings, the DDNN can automa>cally discern intricate pagerns in the data and thus allow for a more nuanced and accurate detec>on of hydrogen levels across a spectrum of humidity and temperature ranges.Its robustness to noise and capability to learn from ambiguous signals further enhance its u>lity, par>cularly in real-world applica>ons where sensor data may be imperfect or incomplete.With regards to its advantages over other deep learning-based techniques for nanoplasmonic hydrogen sensing, there are no such techniques in published literature to our knowledge, which makes direct comparison in this specific respect impossible.Nevertheless, to discuss this issue in the manuscript, we have added the following text to it revised version: " We chose this specific DDNN algorithmic technique because it offers compara6ve advantages over tradi6onal data analysis methods, most notably through its capacity to autonomously recognize complex paSerns within the sensor data, even under complex variable environmental influences.This approach allows for more nuanced sensor performance characteriza6on, par6cularly in challenging con-di6ons that involve non-linear interac6ons between mul6ple variables, as in the present case.Furthermore, its robustness to noise and ambiguous signals renders it especially suitable for real-world appli-ca6ons, such as H2 sensors." Comment 2: The authors used the Pd/Au alloy disk with the composi6on of 70:30 and 197 nm diameter, 25 nm height structure for hydrogen sensor material.Why did the authors use the composi6on and structure of the sensor material for LSPR hydrogen sensing?Please provide the selec6on of the materials in this study.In addi6on, please describe the role of Au in Pd alloy in LSPR sensing.
Our reply: As we men>on explicitly in the first sentence of the Results and Discussion sec>on, we have studied the PdAu alloy system in detail previous publica>ons, as well as assessed the role of Au.Specifically we had wrigen: "For our study, we chose to work with the Pd70Au30 alloy system that we have inves>gated in detail earlier and for which we have iden>fied excellent sensing performance at dry condi>ons 13,18,19 ."In the cited references we demonstrated that adding Au to Pd effec>vely eliminates the intrinsic hysteresis for hydride forma>on/decomposi>on characteris>c for Pd by lowering the cri>cal temperature of the system, that adding 30 % Au is the best compromise between completely elim-ina>ng hysteresis, establishing linear op>cal response to hydrogen and maximizing op>cal contrast per unit sorbed hydrogen.To clarify these points, we have added a sentence, such that the introduc>on to the Results and Discussion sec>on now reads as: "For our study, we chose to work with the Pd70Au30 alloy system that we have inves6gated in detail earlier and for which we have iden6fied excellent sensing performance at dry condi6ons 13,18,19 .Alloying Pd with 30 % Au effec6vely eliminates the intrinsic hysteresis characteris6c for pure Pd by lowering the cri6cal temperature of the system, and it is the best compromise between completely elimina6ng hysteresis, establishing linear op6cal response to hydrogen and maximizing op6cal contrast per unit sorbed hydrogen." Comment 3: As the authors men6oned, the humidity induces the deac6va6on effect on Pd, which results in base line driaing, signal instability, sensi6vity drop, and decelerated response.However, Pd is also prone to oxida6on in humid, high temperature condi6on, which can also cause aforemen6oned problems.I guess that the oxida6on can be the problem for Pd/Au disk to detect hydrogen.If so, the chemical analysis before and aaer the humidity tes6ng is needed.
Our reply: We agree with the Reviewer that Pd can oxidize at elevated temperatures.However, for severe oxida>on to take place, significantly higher temperatures are need (see e.g.Surface Science 2006, 600 (5), 983-994.)In other words, at the temperatures used in this work, only very mild surface oxida>on of Pd takes place and that oxide is readily and immediately reduced already at ambient con-di>ons (and even more efficiently at elevated temperatures) as soon as it is exposed to H2, since also Pd oxidizes dissociates H2 efficiently.
Comment 4: Importantly, there are many criteria for the hydrogen sensor according to the US DoE.Among them, the response 6me for hydrogen sensor requires to be under few seconds since hydrogen is a very light gas that can be explosive around 4 % concentra6on that can leads to catastrophic disaster.However, in the methods sec6on, DDNN requires the whole 6me series of op6cal spectra in the range of 400 -1000 nm as input (pg 26, line 557).Therefore, prac6cality in terms of response 6me of DDNN for hydrogen sensor under humid condi6on must be discussed.
Our reply: This is indeed an important point raised by the Reviewer, which we would like to address as follows.While our DDNN approach offers substan>al improvements in detec>on accuracy under varying humidity condi>ons, it is indeed impera>ve to consider the implica>ons of its response >me.
The DDNN is designed to analyze >me-series of op>cal spectra data, which may introduce a delay as it requires a full sequence of measurements in the spectral range 400 -1000 nm for accurate analysis.However, the computa>onal >me for analyzing a single newly acquired spectrum falls within the span of a few seconds as it is only limited by the acquisi>on hardware.Moreover, the sensor system is capable of con>nuous spectral data acquisi>on, allowing the DDNN to process incoming data in nearreal->me.Hence, the DDNN, when integrated with the sensor's hardware, is configured to meet the cri>cal safety benchmarks for >mely hydrogen detec>on, even in highly humid environments.Future work will focus on further algorithmic refinements and hardware integra>on to minimize latency and ensure that the sensor system consistently delivers on the essen>al quick response >mes mandated for safe hydrogen sensing applica>ons.
"Looking forward, with respect to sensor response 6me not explicitly addressed in this work, yet being another key performance metric for H2 sensors, we note that the DDNN model we used in the first part of this work is structured to require only a single 6me-step, that is a single spectrum, for H2 concen-tra6on predic6on.The Transformer model used in the second part requires a con6nuous readout of 4 6me-steps for its predic6on, which with a sampling rate of roughly 3 seconds considered here, results in the Transformer delivering fully real-6me results aaer an ini6al on-lining period of roughly 9 seconds.Hence, both types of models are essen6ally limited only by the acquisi6on hardware and thus designed to ensure fast response 6mes, provided that the sensor itself can deliver those.To this end, we have recently demonstrated that plasmonic H2 sensors based on the Pd70Au30 alloy system indeed can provide sub-second response, as required by the corresponding US DoE performance target, 13 at least at idealized vacuum/pure H2 condi6ons" Comment 5: In addi6on to sensi6vity, selec6vity is also very important criteria for hydrogen gas sensor.Can DDNN be applied to various sensing environments other than humidity condi6on?For example, hydrogen car or sta6on would require detec6on of hydrogen in the mixture gases that include vola6le organic compounds and other gases such as CO2, CH4 and many more.
Our reply: The Reviewer is correct that hydrogen sensors will be operated in environments where mul->ple molecular species are present and therefore selec>vity is very important.It is therefore one of the key advantages of hydride-forming material-based sensors, such as the one considered in this work, that their sensing mechanism is the absorp>on of hydrogen into inters>>al lance sites since this mechanism makes them intrinsically highly selec>ve to hydrogen (since on other species are absorbed into the material and give rise to a large response).We have explicitly demonstrated this in our earlier work, specifically for species like CO2 and CH4 that the Reviewer asks about, but also for O2, CO and NO2 (Ref 13, Nat. Mater. 2019, 18 (5), 489-495.hgps://doi.org/10.1038/s41563-019-0325-4).Since we had not explicitly men>oned CO2 and CH4 in the corresponding sentence in the introduc>on sec>on, we have added them now and the corresponding sentence reads as: "….high selec>vity, and deac>va>on resistance towards O2, CO2, CH4, CO and NO2, have been demonstrated, the lager using both suitable alloy composi>ons and protec>ve polymer coa>ngs. 3,13,14" When it comes to applying the DDNN in various sensing environments, which is a very relevant ques>on asked by the Reviewer, we note that one of the biggest benefits of using a deep learning approach, such as a DDNN, is that it does not require any assump>ons/knowledge about the underlying physics, which makes the same approach generally usable in different condi>ons.In other words, it does not really mager what environment the sensor is exposed to, as long as appropriate data for training the DDNN in these environments is provided.To make this point clear, we have added the following sentence to the revised manuscript: "Furthermore, we highlight that one of the biggest benefits of using deep learning, such as a DDNN or Transformer, to improve the performance of a sensor is that this approach, in principle, does not require strict assump6ons or prior knowledge about the underlying sensing mechanism.This ensures that the same approach is generally usable in different sensing condi6ons if appropriate data for training in these condi6ons is provided."

Reviewer #3
This paper reported an op6cal nanoplasmonic hydrogen sensor operated at an elevated temperature.A deep dense neural network trained by the spectral response of the sensor enables to decrease the limit of detec6on at 80 % rela6ve humidity.The authors proposed an interes6ng method to solve the humidity interference and improve the limit of detec6on of the sensor.There are s6ll many issues to be addressed in the manuscript.
Comment 1: The elevated temperature eliminates the humidity interference on the sensors is well known, but if the performance of the sensor can remain stable under high temperature and high humidity for days and months.
Our reply: This is a relevant ques>on that also has been raised in Comment #1 by Reviewer #1.We therefore reproduce here the response given to this comment for convenience: These structures are stable over very long >me, as explicitly demonstrated in this work in the long-term stability measurement showcased in Figure 6.And maybe even more clearly demonstrated by the fact that the sensor used in this work has been exposed to high humidity condi>ons in the context of genera>ng the data for this work for 844 hours in total over a period of more than 1.5 years.
To address this point in the revised manuscript, we have added the following text in the revised manuscript on page 20: "As the last aspect, we note that the sensor used throughout this work has spent a total of 844 hours on stream in high humidity experiments during a period of more than 1.5 years, where it was intermittently stored at ambient condi6ons.Yet, its response is unchanged, and performance prevails, which corroborates both its structural and surface chemical integrity over 6me." Comment 2: It is not accurate to deduce the LOD by calcula6ng the signal at 3 6mes the noise level, the signal varies significantly at small concentra6ons.It is beSer to experimental measure the sensor at 200 ppm H2.
Our reply: We agree with the Reviewer that it is likely beger prac>ce to not only extrapolate but also measure down to the extrapolated value.In the context of our major effort to address the Reviewers Comment #5 below, we have extended our measured range not only down to 200 ppm but all the way to 100 ppm.Hence, the specific reply to this comment is included in the reply to Comment # 5 below and we therefore refer to this reply for more details.
Moreover, the noise level of the sensor should be presented and evaluated for different condi6ons, such as temperature and humidity.
Our reply: As already men>oned in response to a similar comment by Reviewer #1 (Comment #2), we have already done this since this informa>on included in Fig. S10, which depicts noise level, expressed as standard devia>on in the Δλpeak signal, ploged as a func>on of hydrogen concentra>on, rela>ve humidity, and sensor opera>ng temperature.These data show that the noise is independent from all these factors, which corroborates that what defines the noise level is op>cal noise, that is the intrinsic readout noise of the used spectrometer and fluctua>ons in the light source intensity.
Comment 3: The resolu6on of figures in the manuscript should be improved.
Our reply: We are not exactly sure what the Reviewer is referring to here but guess it might have to do with the resolu>on in the pdf file?We will make sure that the resolu>on of our original figures is high enough to guarantee high resolu>on in the final pdf.
Comment 4: Using a neural network to improve the weak signal of the sensor requires extremely stable and repeatable sensors in different batches.If the sensors from different batches will obtain the same performance when applying the neural network.How many data sets are applied for training?
Our reply: Ques>on is very unclear.Provide an argument for how much data is used to train the network, and how much would be needed to generalize across different batches.
Comment 5: The data applied to train the neural network is obtained via the sensor by fixed concentra6on and rela6ve humidity.If the data of the sensor for other combina6ons of concentra6on and rela6ve humidity, such as 0.05% H2 concentra6on at 75% rela6ve humidity, could verify the robustness of the neural network.The data with high humidity and low concentra6on can validate extreme condi6ons.

Our reply: This is a very relevant and important comment by the Reviewer, that has inspired us to execute two new sets of experiments. In the first one, we executed a new H2 pulse sequence (using the same concentra>on pulses as in the work so far) but at RH-values in between the ones used for the original Transformer model training.
We then analyze these data using both the old model not trained on these humidi>es and a retrained model trained on an enriched dataset that also includes the intermediate RH.Following the same line, we performed a second new experiment where we expanded the H2 pulses to significantly lower concentra>on than the ones tested so far, i.e., from 0.06 % as the lowest concentra>on to 0.01 % as the lowest concentra>on.
To address all this, we have added the following new sec>on with two new figures to the revised manuscript:

Transformer response in (untrained) intermediate RH and down to 0.01 % (100 ppm) H2
The performance of machine learning methods in general, and of both the DDNN and Transformer models we use in this study, is inherently strongly depending on the quality of the data used for training.Furthermore, it is intui6ve that the performance of a deep learning model to make predic6ons at condi6ons that are significantly different from the training condi6ons will be worse than if data to be analyzed are generated within the range of the training condi6ons.It is therefore important to address this aspect and discuss its implica6ons for neural network enabled plasmonic H2 sensors.Here, we do this in two steps by first assessing sensor performance at RH-levels intermediate to the ones the Transformer was trained on, and by in the second step expanding our sensing range to H2 concentra6ons below the lowest value explored so far, i.e., down to 0.01 % H2 and across the full humidity range up to 80 % RH.
To assess the ability of the Transformer to handle a sensor environment characterized by RH-levels intermediate to the ones used for its ini6al training, we executed again the ISO 26412:2010 H2 concentra6on pulse sequence introduced above at 80 °C, but with intermediate RH values of 0, 20, 35, 50, 75, 85 % (Figure 7a).Ploong first the standard λpeak readout reveals the expected behavior with increasing magnitude of nega6ve response as RH increases and fully recovered sensor response when returning to dry opera6on condi6ons (Figure 7b).Applying the old Transformer model, that is, the model trained at the original (and thus different) RH values, reveals that it is able to reasonably predict the high concentra6on H2 pulses but that it falls short on iden6fying the lowest concentra6ons (Figure 7c).This is not surprising because the model's predic6ve accuracy is con6ngent on the diversity of the training dataset, i.e., for predic6ng low H2 concentra6ons the model will be sensi6ve to the par6cular noise-characteris6cs at inference 6me.Simultaneously, this reduced performance is easily mi6gated by re-training the Transformer on a dataset enriched with the new RH levels and relevant noise condi-6ons to enable full recovery of its predic6ve performance also at the intermediate RH-values, all the way down to the smallest pulse of 0.06 % H2 (Figure 7d).The new data were incorporated into the training of the model with input-label pairs analogously as for the original datasets above (Figure S12), consis6ng of the sequence of on-ramps of increasing H2 concentra6on and off-ramps of decreasing H2 concentra6on.

d) Correspondingly obtained Transformer-based readout from a retrained model that thus also has seen sensor response at the intermediate RH-values during training. e) Sensor LoD as obtained by
the standard λpeak readout for the different RH, as defined by signal extrapola>on (orange) and the smallest directly measured H2 pulse that could be discerned within 3 standard devia>ons (red).Note that above 20 % RH, consistent with results in Figure 4, the sensor falls short on the US DoE target of LoD < 0.1 % H2. f) Sensor LoD as obtained by Transformer-based readout,    , , using the original training at different RH as used in the measurements here.Note that while the predic>on accuracy at the smallest H2 concentra>ons is lower than at the original RH tests (cf. Figure 4c), the precision remains very high, effec>vely retaining a low es>mated LoD.However, the results are dependent on precise noise characteris>cs at inference >me and thus lead to an inconsistent measurement of low H2 concentra>on pulses.g) Sensor LoD as obtained by the Transformer-based readout aver re-training on the enriched dataset including also the intermediate RH-values, revealing again an essen>ally RH-independent LoD that lies significantly below the DoE target of 0.1 % (grey shaded area).The LoD es>-ma>on procedure is explained in Figure S16.
To further inves6gate the Transformer performance at intermediate RH, we extract the LoD of the sensor obtained in three different ways, i.e., using the standard λpeak readout (Figure 7e), the old Transformer model (Figure 7f) and the re-trained Transformer model (Figure 7g).Furthermore, we apply two dis6nct ways to define the LoD.The first one is to simply extract the discrete smallest H2 concen-tra6on that could be directly measured (λpeak readout) or predicted (Transformer) within 3 standard devia6ons of certainty.The second one is obtained by extrapola6on, i.e., by fiong the measured λpeak readout or the Transformer-predic6on's standard devia6on ( ( $ " )) as a logarithmic func6on of con-centra6on and then iden6fying the lowest  $ " that can be extrapolated with a precision of 3 ( $ " ), as described in Figure S16.For the λpeak readout, as already seen above (cf.Figure 4a), we find that the LoD increases with humidity, failing to meet the DoE target at higher RH levels for both LoD defini6ons, as the sensor's response to low H2 concentra6ons becomes less dis6nguishable from the baseline noise (Figure 7e).
For the old Transformer model, we find that it retains high precision but its accuracy in predic6ng the lowest H2 concentra6ons declines with increasing RH, reflec6ng the model's constraints when extrap-ola6ng beyond its training condi6ons (Figure 7f).Here, the highest H2 concentra6on that cannot be discerned from noise at all (i.e., the model predicts    = 0 within 3σ) defines the smallest possible es6mated LoD.This leads to an iden6cal extrapolated (fit) and measured discrete LoD for the old Transformer model.For the re-trained model, we find a consistent and RH-independent LoD that is well below the DoE target across all humidity levels for the discrete values and even lower for the extrapolated LoDs based on the logarithmic fit.This corroborates the model's improved robustness and pre-dic6ve power aaer incorpora6ng the intermediate RH values into its training dataset (Figure 7g).
As the final step to test the performance of the Transformer model outside its ini6al training regime, we executed a pulse sequence in synthe6c air at 80 °C with    pulses ranging from 0.01 % H2 to 0.2 % H2 for RH = 0, 20, 50 and 80 % (Figure 8a).In other words, we extend the lower concentra6on limit in the pulses from the originally lowest value of 0.06 % H2 to 0.01 % H2.Applying the standard λpeak readout reveals small but dis6nct blue-shias for small    pulses and red-shias for the largest pulses, as expected (Figure 8b).
Applying the old Transformer model only trained on data with    pulses down to 0.06 % H2, improves the response significantly but also clearly shows that the model falls short on dis6nctly predic6ng the new lowest concentra6on pulses (Figure 8c).This is not surprising because these concentra6ons are below what was included in training, and again the noise characteris6cs are typically different.Accordingly, the poor response provided by the Transformer to the lowest    pulses is easily alleviated by re-training of the old model to also encompass data obtained in this low    range, which enables the reliable detec6on of H2 also at the lowest pulse    = 0.01 % or 100 ppm H2 for all RH (Figure 8d).These new data were incorporated into the training of the model with input-label pairs analogously as for the original datasets above (Figure S12), consis6ng of the sequence of on-ramps of increasing H2 concentra6on and off-ramps of decreasing H2 concentra6on.Note that now also the lowest 0.01 % H2 concentration pulse is predicted with high accuracy.e) Sensor LoD as obtained by the standard λpeak readout for the different RH, as defined by signal extrapolation (orange) and the smallest measured H2 pulse which could be discerned within 3 standard deviations (red).f) Sensor LoD as obtained by Transformer-based readout,    , .Note again here that while the accuracy of the predictions of the smallest H2 concentrations is lower, the precision remains very high, effectively retaining a low estimated LoD. g) Sensor LoD as obtained by the Transformer-based readout after re-training on the given dataset, revealing again an essentially RH-independent LoD that lies significantly below the DoE target of 0.1 % and now extends down to 0.01% or 100 ppm as the lowest directly measured H2 concentration.The LoD estimation procedure is described in Figure S16.
To finalize our analysis, also for this scenario, we extract the LoD of the sensor based on the standard λpeak readout (Figure 8e), the old Transformer model (Figure 8f) and the re-trained Transformer (Figure 8g), and again dis6nguish between the discrete LoD values, that is, the smallest measured H2 pulse which could be predicted within 3 standard devia6ons of certainty, and the ones obtained by extrap-ola6on based on a logarithmic fit, as described in Figure S16.For the λpeak readout, we find that the LoD again quickly increases with the level of humidity, indica6ng a loss of sensi6vity in more humid condi6ons, which is especially pronounced for lower H2 concentra6ons (Figure 8e).
For the old Transformer model, we find that it provides high precision across all RH levels but is less accurate for lower H2 concentra6ons, due to the lack of training data in these specific condi6ons (Figure 8f).This means that the es6mated LoD, which is based on a con6nuous fit to the model's predic6on precision in each H2 pulse discernible from noise, s6ll reaches values comparable with the original es6mates (cf. Figure 4c).The reason for the seen discrepancy between some of the discrete and fiSed LoD values is the result of the underspecified training data, i.e., the consequence of applying the model to data obtained outside its training condi6ons.
For the re-trained Transformer model, however, we find a consistent LoD across the full range of RH levels, maintaining high precision and accuracy even at the lowest H2 concentra6ons (Figure 8g).This demonstrates the benefits of including a wider range of H2 concentra6ons in the training data.Further, if the noise characteris6cs at inference 6me can be calibrated by re-training, the actual LoD can reach far below the DoE target of 0.1% and here reaches a record low LoD of 0.01 % or 100 ppm H2 in humid air at 80 % RH.The higher (fit) LoD at RH=0% is due to less precise predic6ons for all H2 levels included in the logarithmic fit, resul6ng in an extrapolated LoD that exceeds the lowest observable (discrete) H2 concentra6on, likely because the smallest H2 pulses induce a smaller spectral shia (Δλpeak) in dry con-di6ons (cf. Figure 8b).
Taken all together, the results of this sec6on highlight two important and generic aspects of using deep learning to enhance sensor response, notably neither limited to plasmonic and hydrogen-targe6ng ones, nor to the specific type of deep learning model used.The first aspect is that any model will perform worse in its predic6ons if the data it is to analyze were obtained at condi6ons that are different from the ones used to generate the training data.Obviously, the decrease in performance will be larger, the larger the difference between training and measurement condi6ons.The second aspect is that this apparent shortcoming is easily mi6gated by re-training of the model used.
We argue that this importance of training the used model at the "right" condi6ons is not a problem from a technical sensor applica6on perspec6ve, since it is easily implemented in case the condi6ons of a targeted sensor applica6on environment is known prior to sensor hardware deployment.A scenario that seems realis6c for most cases.This approach, in fact, may even provide new opportuni6es to significantly enhance the applicability of one and the same sensor hardware to widely different applica6on condi6ons since no changes to the hardware have to be made when adapta6on to a specific sensing environment can be implemented on the basis of the output data treatment only, enabled by training condi6ons tailored for specific applica6on environments." In addi>on, we added the following sec>on to the Conclusions: "As the last key conclusion, we have shown that sensor performance based on the Transformer readout (and any other deep learning model) deteriorates when the sensing environment, here in terms of RH or H2 concentra6on range, is different from the condi6ons used to generate the training data.As the key point, however, we demonstrated how this is easily mi6gated by re-training the model by also including these new condi6ons.In this way were able to achieve a record LoD of 0.01 % or 100 ppm H2 at RH = 80 % in air, and therefore exceed the DoE target by one order of magnitude -notably with the poten6al for further improvement by further op6mized training.
Looking forward, with respect to sensor response 6me not explicitly addressed in this work, yet being another key performance metric for H2 sensors, we note that the DDNN model we used in the first part of this work is structured to require only a single 6me-step, that is a single spectrum, for H2 concen-tra6on predic6on.The Transformer model used in the second part requires a con6nuous readout of 4 6me-steps for its predic6on, which with a sampling rate of roughly 3 seconds considered here, results in the Transformer delivering fully real-6me results aaer an ini6al on-lining period of roughly 9 seconds.Hence, both types of models are essen6ally limited only by the acquisi6on hardware and thus designed to ensure fast response 6mes, provided that the sensor itself can deliver those.To this end, we have recently demonstrated that plasmonic H2 sensors based on the Pd70Au30 alloy system indeed can provide sub-second response, as required by the corresponding US DoE performance target, 13 at least at idealized vacuum/pure H2 condi6ons."

Figure 8 .
Figure 8. Transformer response to H2 concentrations down to 0.01 % or 100 ppm H2. a) The ISO 26412:2010 hydrogen safety sensor test protocol in synthetic air run at 80 °C with    pulses ranging from 0.01 % H2 to 0.2 % H2, and measured at RH = 0, 20, 50 and 80 %. b) Correspondingly obtained λpeak response, characterized by distinct blue-shifts for small    pulses and red-shifts for the largest pulses.c) Correspondingly obtained Transformer-based readout,    , , obtained by directly applying the old Transformer model trained in the 0.06 -1.2 % H2 concentration range for RH = 0, 20, 50, 80 %. d) Correspondingly obtained Transformer-based readout from a re-trained model that thus also has seen sensor response to the lowest    pulses during training.Note that now also the lowest 0.01 % H2 concentration pulse is predicted with high accuracy.e) Sensor LoD as obtained by the standard λpeak readout for